Skip to main content

Faster Biclique Mining in Near-Bipartite Graphs

  • Conference paper
  • First Online:
Analysis of Experimental Algorithms (SEA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11544))

Included in the following conference series:

  • 640 Accesses

Abstract

Identifying dense bipartite subgraphs is a common graph data mining task. Many applications focus on the enumeration of all maximal bicliques (MBs), though sometimes the stricter variant of maximal induced bicliques (MIBs) is of interest. Recent work of Kloster et al. introduced a MIB-enumeration approach designed for “near-bipartite” graphs, where the runtime is parameterized by the size k of an odd cycle transversal (OCT), a vertex set whose deletion results in a bipartite graph. Their algorithm was shown to outperform the previously best known algorithm even when k was logarithmic in |V|. In this paper, we introduce two new algorithms optimized for near-bipartite graphs - one which enumerates MIBs in time \(O(M_I |V| |E| k)\), and another based on the approach of Alexe et al. which enumerates MBs in time \(O(M_B |V| |E| k)\), where \(M_I\) and \(M_B\) denote the number of MIBs and MBs in the graph, respectively. We implement all of our algorithms in open-source C++ code and experimentally verify that the OCT-based approaches are faster in practice than the previously existing algorithms on graphs with a wide variety of sizes, densities, and OCT decompositions.

This work was supported by the Gordon & Betty Moore Foundation’s Data-Driven Discovery Initiative under Grant GBMF4560 to Blair D. Sullivan and the NC State College of Engineering REU program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, P., Alon, N., Aronov, B., Suri, S.: Can visibility graphs be represented compactly? Discret. Comput. Geom. 12, 347–365 (1994)

    Article  MathSciNet  Google Scholar 

  2. Akiba, T., Iwata, Y.: Branch-and-reduce exponential/FPT algorithms in practice: a case study of vertex cover. Theoret. Comput. Sci. 609, 211–225 (2016)

    Article  MathSciNet  Google Scholar 

  3. Alexe, G., Alexe, S., Crama, Y., Foldes, S., Hammer, P., Simeone, B.: Consensus algorithms for the generation of all maximal bicliques. Discret. Appl. Math. 145, 11–21 (2004)

    Article  MathSciNet  Google Scholar 

  4. Dawande, M., Keskinocak, P., Swaminathan, J., Tayur, S.: On bipartite and multipartite clique problems. J. Algorithms 41, 388–403 (2001)

    Article  MathSciNet  Google Scholar 

  5. Dias, V., De Figueiredo, C., Szwarcfiter, J.: Generating bicliques of a graph in lexicographic order. Theoret. Comput. Sci. 337, 240–248 (2005)

    Article  MathSciNet  Google Scholar 

  6. Eppstein, D.: Arboricity and bipartite subgraph listing algorithms. Inf. Process. Lett. 51, 207–211 (1994)

    Article  MathSciNet  Google Scholar 

  7. Garey, M., Johnson, D.: Computers and Intractability: A Guide to NP-Completeness. Freeman, San Fransisco (1979)

    MATH  Google Scholar 

  8. Gély, A., Nourine, L., Sadi, B.: Enumeration aspects of maximal cliques and bicliques. Discret. Appl. Math. 157(7), 1447–1459 (2009)

    Article  MathSciNet  Google Scholar 

  9. Goodrich, T., Horton, E., Sullivan, B.: Practical graph bipartization with applications in near-term quantum computing,. arXiv preprint arXiv:1805.01041, 2018

  10. Gülpinar, N., Gutin, G., Mitra, G., Zverovitch, A.: Extracting pure network submatrices in linear programs using signed graphs. Discret. Appl. Math. 137, 359–372 (2004)

    Article  MathSciNet  Google Scholar 

  11. Horton, E., Kloster, K., Sullivan, B.D., van der Poel, A., Woodlief, T.: MI-bicliques: Version 2.0, August 2019. https://doi.org/10.5281/zenodo.3381532

  12. Hüffner, F.: Algorithm engineering for optimal graph bipartization. In: Nikoletseas, S.E. (ed.) WEA 2005. LNCS, vol. 3503, pp. 240–252. Springer, Heidelberg (2005). https://doi.org/10.1007/11427186_22

    Chapter  Google Scholar 

  13. Chang, W.: Maximal biclique enumeration, December 2004. http://genome.cs.iastate.edu/supertree/download/biclique/README.html

  14. Iwata, Y., Oka, K., Yoshida, Y.: Linear-time FPT algorithms via network flow. In: SODA, pp. 1749–1761 (2014)

    Google Scholar 

  15. Kaytoue-Uberall, M., Duplessis, S., Napoli, A.: Using formal concept analysis for the extraction of groups of co-expressed genes. In: Le Thi, H.A., Bouvry, P., Pham Dinh, T. (eds.) MCO 2008. CCIS, vol. 14, pp. 439–449. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87477-5_47

    Chapter  Google Scholar 

  16. Kaytoue, M., Kuznetsov, S., Napoli, A., Duplessis, S.: Mining gene expression data with pattern structures in formal concept analysis. Inf. Sci. 181, 1989–2011 (2011)

    Article  MathSciNet  Google Scholar 

  17. Kloster, K., Sullivan, B., van der Poel, A.: Mining maximal induced bicliques using odd cycle transversals. In: Proceedings of the 2019 SIAM International Conference on Data Mining (2019, to appear)

    Chapter  Google Scholar 

  18. Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. Comput. Netw. 31, 1481–1493 (1999)

    Article  Google Scholar 

  19. Kuznetsov, S.: On computing the size of a lattice and related decision problems. Order 18, 313–321 (2001)

    Article  MathSciNet  Google Scholar 

  20. Li, J., Liu, G., Li, H., Wong, L.: Maximal biclique subgraphs and closed pattern pairs of the adjacency matrix: a one-to-one correspondence and mining algorithms. IEEE Trans. Knowl. Data Eng. 19, 1625–1637 (2007)

    Article  Google Scholar 

  21. Lokshtanov, D., Saurabh, S., Sikdar, S.: Simpler parameterized algorithm for OCT. In: Fiala, J., Kratochvíl, J., Miller, M. (eds.) IWOCA 2009. LNCS, vol. 5874, pp. 380–384. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10217-2_37

    Chapter  Google Scholar 

  22. Makino, K., Uno, T.: New algorithms for enumerating all maximal cliques. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 260–272. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27810-8_23

    Chapter  Google Scholar 

  23. Mushlin, R., Kershenbaum, A., Gallagher, S., Rebbeck, T.: A graph-theoretical approach for pattern discovery in epidemiological research. IBM Syst. J. 46, 135–149 (2007)

    Article  Google Scholar 

  24. Panconesi, A., Sozio, M.: Fast hare: a fast heuristic for single individual SNP haplotype reconstruction. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS, vol. 3240, pp. 266–277. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30219-3_23

    Chapter  Google Scholar 

  25. Peeters, R.: The maximum edge biclique problem is NP-complete. Discret. Appl. Math. 131, 651–654 (2003)

    Article  MathSciNet  Google Scholar 

  26. Sanderson, M., Driskell, A., Ree, R., Eulenstein, O., Langley, S.: Obtaining maximal concatenated phylogenetic data sets from large sequence databases. Mol. Biol. Evol. 20, 1036–1042 (2003)

    Article  Google Scholar 

  27. Schrook, J., McCaskey, A., Hamilton, K., Humble, T., Imam, N.: Recall performance for content-addressable memory using adiabatic quantum optimization. Entropy 19, 500 (2017)

    Article  Google Scholar 

  28. Tsukiyama, S., Ide, M., Ariyoshi, H., Shirakawa, I.: A new algorithm for generating all the maximal independent sets. SIAM J. Comput. 6, 505–517 (1977)

    Article  MathSciNet  Google Scholar 

  29. Wernicke, S.: On the algorithmic tractability of single nucleotide polymorphism (SNP) analysis and related problems (2014)

    Google Scholar 

  30. Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets. NATO Advanced Study Institutes Series (Series C– Mathematical and Physical Sciences), vol. 83, pp. 445–470. Springer, Dordrecht (1982). https://doi.org/10.1007/978-94-009-7798-3_15

    Chapter  Google Scholar 

  31. Yannakakis, M.: Node-and edge-deletion NP-complete problems. In: STOC, pp. 253–264 (1978)

    Google Scholar 

  32. Zhang, Y., Phillips, C.A., Rogers, G.L., Baker, E.J., Chesler, E.J., Langston, M.A.: On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC Bioinform. 15, 110 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew van der Poel .

Editor information

Editors and Affiliations

Appendices

Appendices

A MIB-Enumeration Framework Subroutines

We now provide algorithmic details and proofs of the complexity and correctness of MakeIndMaximal and AddTo.

1.1 A.1 MakeIndMaximal

Recall that MakeIndMaximal takes in (CS), where C is an induced biclique and \(S \subseteq V\), and either returns a MIB \(C^+\) where \(C \subseteq C^+\), \(C^+ \subseteq C \cup S\), \(C \ne \emptyset \), or returns \(\emptyset \). If it returns \(\emptyset \) and \(C \ne \emptyset \) then there is another MIB D which contains C and \(v \in (V \setminus S) \setminus C\). We give pseudo-code of MakeIndMaximal in Algorithm 3.

figure c

Lemma 5

MakeIndMaximal returns a MIB \(C^+\) where \(C \subseteq C^+\), \(C^+ \subseteq C \cup S\), \(C \ne \emptyset \), or returns \(\emptyset \).

Proof

Referring to the pseudo-code in Algorithm 3, it is clear that \(C \subseteq C^+\), as no vertices are ever removed from the input biclique C. Furthermore, the only vertices added to \(C^+\) are from S, so \(C^+ \subseteq C \cup S\) and \(C^+\) is the only biclique returned by MakeIndMaximal. Note that neither side of C is empty and the only vertices added are independent from the side of the biclique which they are added to, so if we do not return \(\emptyset \) the object returned is an induced biclique. If no node from outside of S can be added to \(C^+\), then we will not return \(\emptyset \) and thus \(C^+\) is maximal.

Lemma 6

If MakeIndMaximal returns \(\emptyset \) and \(C \ne \emptyset \) then there is another MIB D in G which contains C and \(v \in (V \setminus S) \setminus C\).

Proof

Note that \(C \subseteq C^* = C_1 \times C_2\) at line 12. As MakeIndMaximal returns \(\emptyset \) there must be a vertex \(v \in V_S = V \setminus (S \cup C^*)\) which can be added to \(C^*\). Let D be a MIB containing \(C^*\) and v, thus D suffices to prove the lemma.

Lemma 7

MakeIndMaximal runs in O(m) time.

Proof

Note that because G is connected, \(n \in O(m)\). Setting \(C_S\) and \(V_S\) can be done in O(n) time. In each for loop, we can scan all of the edges incident to each v in the iterated-over set and keep count of how many nodes from \(C_i\) have been seen (checking for inclusion can be done in O(1) time with an O(n) initialization step). Thus, each edge is scanned at most once per for loop.

1.2 A.2 AddTo

Recall that AddTo takes in (Cv) where \(C=C_1 \times C_2\) is an induced biclique and \(v \in V \setminus (C_1 \cup C_2)\), and returns the induced biclique where v is added to \(C_1\), N(v) is removed from \(C_1\), and \(\overline{N}(v)\) is removed from \(C_2\) if \(C_2 \setminus \overline{N}(v) \ne \emptyset \) and \(\emptyset \) otherwise. We give pseudo-code of AddTo in Algorithm 4.

figure d

Lemma 8

AddTo returns the induced biclique where v is added to \(C_1\), N(v) is removed from \(C_1\), and \(\overline{N}(v)\) is removed from \(C_2\) if \(C_2 \setminus \overline{N}(v) \ne \emptyset \), and \(\emptyset \) otherwise.

Proof

Referring to the pseudo-code in Algorithm 4, it is clear that v is added to \(C_1\) and N(v) is removed from \(C_1\). Additionally v’s non-neighbors are effectively removed from \(C_2\) by intersecting it with N(v). If \(C_2' = \emptyset \) then \(C_2 \setminus \overline{N}(v) = \emptyset \) and \(\emptyset \) is returned. Otherwise \(C_1' \ne \emptyset \) since it includes v and thus \(C_1' \times C_2'\) is a biclique. \(C_1' \times C_2'\) must be an induced biclique as \(C_2' \subseteq C_2\), \(C_1' \setminus \{v\} \subseteq C_1\), and \(C_1 \times C_2\) is an induced biclique and \((N(v) \cap C_1') = \emptyset \) by definition.

Lemma 9

AddTo runs in O(m) time.

Proof

Note that because G is connected, \(n \in O(m)\). AddTo can be completed by scanning all of v’s O(m) incident edges in tandem with an O(n) preprocessing step to allow for constant-time look-ups when checking for inclusion in a set.

B MB-Enumeration Framework Subroutines

We give a detailed description of the MakeMaximal and Consensus subroutines used in OCT-MICA, along with arguments of their correctness and complexity.

1.1 B.1 MakeMaximal

Extending a biclique to be maximal is different in the non-induced case from the induced case, since MBs are completely characterized by one side of the biclique.

figure e

Lemma 10

MakeMaximal runs in O(m) time.

Proof

In order to form \(X^*\), we can scan the edges incident to each \(v \in Y\) and keep count of how many nodes from \(X^*\) have been seen (checking for inclusion can be done in O(1) time with an O(n) initialization step). The same can be done for \(Y^*\), where instead we scan the edges incident to each \(v \in X^*\). Thus, each edge is scanned at most twice in MakeMaximal.

1.2 B.2 Consensus

The MICA section of OCT-MICA relies heavily on the Consensus operation introduced in [3] for finding new candidate bicliques. For each pair of bicliques, there are four candidate bicliques which form the consensus of the pair. Note that any of the four candidates may be empty and if so discarded. Consensus runs in O(n) time using standard techniques for set union and intersection.

figure f

C Additional Enumeration Experiments

Here we include figures corresponding to additional experimental results of our initial benchmarking and on the computation biology data from [29] described in Sects. 5.2 and 5.4 respectively (Figs. 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 and Tables 2, 3).

Fig. 5.
figure 5

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 1000\) and \(n_O = 10\). The ratio \(n_L/n_R\) was varied.

Fig. 6.
figure 6

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 200\) and \(n_O = 10\). The ratio \(n_L/n_R\) was varied.

Fig. 7.
figure 7

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 1000\) and \(n_O = 19 \approx 3\log _3(n_B)\). The ratio \(n_L/n_R\) was varied.

Fig. 8.
figure 8

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 200\) and \(n_O = 14 \approx 3\log _3(n_B)\). The ratio \(n_L/n_R\) was varied.

Fig. 9.
figure 9

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 1000\) and \(n_O = 10\). The coefficient of variation between L and R was varied.

Fig. 10.
figure 10

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 200\) and \(n_O = 10\). The coefficient of variation between L and R was varied.

Fig. 11.
figure 11

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 1000\) and \(n_O = 19 \approx 3\log _3(n_B)\). The coefficient of variation between L and R was varied.

Fig. 12.
figure 12

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 200\) and \(n_O = 14 \approx 3\log _3(n_B)\). The coefficient of variation between L and R was varied.

Fig. 13.
figure 13

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 1000\) and \(n_O\) was varied.

Fig. 14.
figure 14

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 1000\) and \(n_O = 19 \approx 3\log _3(n_B)\). The expected edge density between O and \(\{L,R\}\) was varied.

Fig. 15.
figure 15

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 1000\) and \(n_O = 10\). The expected edge density within O was varied.

Fig. 16.
figure 16

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 1000\) and \(n_O = 19 \approx 3\log _3(n_B)\). The expected edge density within O was varied.

Fig. 17.
figure 17

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 150\), \(n_L = n_R\) and \(n_O = 5\). The expected edge density in the graph was varied except for the expected edge density within O which was fixed to 0.05.

Fig. 18.
figure 18

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 150\), \(n_L = n_R\) and \(n_O = 5\). The expected edge density in the graph was varied, including the expected edge density within O.

Fig. 19.
figure 19

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 200\), \(n_L = n_R\) and \(n_O = 5\). The expected edge density in the graph was varied except for the expected edge density within O which was fixed to 0.05.

Fig. 20.
figure 20

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 200\), \(n_L = n_R\) and \(n_O = 5\). The expected edge density in the graph was varied, including the expected edge density within O.

Fig. 21.
figure 21

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 300\), \(n_L = n_R\) and \(n_O = 5\). The expected edge density in the graph was varied except for the expected edge density within O which was fixed to 0.05.

Fig. 22.
figure 22

Runtimes of the MIB-enumerating (left) and MB-enumerating (right) algorithms on graphs where \(n_B = 300\), \(n_L = n_R\) and \(n_O = 5\). The expected edge density in the graph was varied, including the expected edge density within O.

Table 2. The runtimes (rounded to nearest thousandth-of-a-second) of the biclique-enumeration algorithms on the Afro-American subset of the Wernicke-Hüffner computational biology data [29].
Table 3. The runtimes (rounded to nearest thousandth-of-a-second) of the biclique-enumeration algorithms on the Japanese subset of the Wernicke-Hüffner computational biology data [29].

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sullivan, B.D., van der Poel, A., Woodlief, T. (2019). Faster Biclique Mining in Near-Bipartite Graphs. In: Kotsireas, I., Pardalos, P., Parsopoulos, K., Souravlias, D., Tsokas, A. (eds) Analysis of Experimental Algorithms. SEA 2019. Lecture Notes in Computer Science(), vol 11544. Springer, Cham. https://doi.org/10.1007/978-3-030-34029-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34029-2_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34028-5

  • Online ISBN: 978-3-030-34029-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics